\section{Intel x86 Architecture}
Even though a more detailed familiarity with the subject is required for a full
understanding of the design and implementation of the Muen kernel, this section
tries to give a short tour of the Intel x86 architecture. The basic components
of a modern Intel x86\index{x86} computer are depicted in figure
\ref{fig:intel-architecture}. The interested reader is directed to the Intel SDM
\cite{IntelSDM}, which contains a complete and in-depth description of the x86
architecture.

\begin{figure}[h]
	\centering
	\input{graph_intel_architecture}
	\caption{Intel architecture}
	\label{fig:intel-architecture}
\end{figure}

\subsection{Processor}
The main processor of the system is called the central processing unit
(CPU\index{CPU}). Multi-processor systems (MP\index{MP}) have multiple CPUs
which in turn can have multiple cores. If Intel Hyper-Threading Technology
(HTT\index{HTT}) is supported, each core has two or more so called
\emph{logical CPUs}\index{logical CPU}.

The logical CPUs run processes by executing instructions. These instructions
change the state of the logical CPU and the system as a whole.

\subsubsection{Execution Environment}\label{subsubsec:exec-env}
A program executing on a processor is given access to resources to store and
retrieve information, such as code or data. Resources provided by a modern
64-bit processor constituting the basic execution environment\index{execution
environment} are described in the following list:

\begin{description}
	\item[Memory address space] is used by a program to access RAM\index{RAM}.
	\item[Stack] is located in memory and facilitates subprogram calls and
		parameter passing in conjunction with stack management resources of the
		processor.
	\item[Execution registers] constitute the core of the execution environment.
		They consist of sixteen general-purpose, six segment and one flag
		register as well as the instruction pointer.
	\item[Control registers] define the current processor operating mode and
		other properties of the execution environment.
	\item[Descriptor table registers] are used to store the location of memory
		management and interrupt handling data structures.
	\item[Debug registers] allow a program to control and use the debugging
		functionality of a processor.
	\item[x87 FPU registers] offer support for floating-point operations.
	\item[MMX registers] offer support for single-instruction, multiple-data
		(SIMD) operations on integer numbers.
	\item[XMM registers] offer support for SIMD operations on floating-point
		numbers\footnote{Processors may contain more advanced vector support
		(e.g. YMM)}.
	\item[Model-specific registers (MSRs)] provide control of various hardware
		and software-related features. Their number and functionality varies
		depending on the given processor implementation.
	\item[I/O ports] provide communication with devices on a different address
		space.
\end{description}

Further details about the Intel processor execution environment can be found in
\cite{IntelSDM}, volume 1, section 3.2.1. All instructions provided by a
processor are called an instruction set. The complete Intel instruction set
architecture (ISA)\index{ISA} is specified in \cite{IntelSDM}, volumes 2A-C.

\subsubsection{Caches}
The processor has numerous internal caches\index{cache} and buffers of varying
sizes and properties. Their main purpose is to increase processor performance
by hiding latencies of memory accesses, e.g. reading data from RAM. The most
important caches are outlined in the following list:

\begin{itemize}
	\item Level 1 instruction cache
	\item Level 1 data cache
	\item Level 2 \& 3 unified caches
	\item Translation lookaside buffers (TLB\index{TLB})
	\item Store buffer
	\item Write Combining buffer
	\item Branch Prediction Cache (BPC\index{BPC})
\end{itemize}

Since caches are shared and because they can only be controlled to a limited
degree, their state can be changed and observed by different parts of a system.
Thus, some of the caches can be used as side-/covert-channels\index{channels}
and pose a challenge to effective component isolation.

\subsubsection{Protected Mode}
A modern CPU provides several operating modes of which one is called
\emph{protected mode}\index{protected mode}. In this mode the processor
provides different privilege levels also called rings\index{rings}. The rings
are numbered from 0 to 3, with ring 0 having the most privileges. Common
operating systems use only rings 0 and 3 while disregarding other privilege
levels.

Privileged instructions such as switching memory management data structures are
only executable in ring 0, also called \emph{supervisor mode}\index{supervisor
mode}. Operating systems such as Linux usually execute user applications in the
unprivileged ring 3, called \emph{user mode}\index{user mode}.

\subsubsection{IA-32e Mode}
The Intel 64 architecture runs in a processor mode named
\emph{IA-32e}\index{IA-32e}, also known as \emph{long mode}\index{long mode}.
It has two submodes: compatibility and 64-bit mode.  The first submode allows
the execution of most 16 and 32-bit applications without re-compilation, by
essentially providing the same execution environment as in 32-bit protected
mode.

The IA-32e 64-bit submode enables 64-bit kernels and operating systems to run
applications making use of the full 64-bit linear address space. The execution
environment is extended by additional general purpose registers and most
register sizes are extended to 64 bits.

In the context of this project, IA-32e mode generally refers to the 64-bit
IA-32e submode if not stated differently. Additional information about this mode
of operation and the execution environment can be found in \cite{IntelSDM},
volume 1, section 3.2.1.

\subsection{Memory Management}
Current processors support management of physical memory through means of
\emph{segmentation}\index{segmentation} and \emph{paging}\index{paging}.
Application processes running on a processor are provided with a virtual
address space. The processes are given the illusion that they run alone on a
system and have unrestricted access to system memory. Memory access of
processes are translated using the aforementioned segmentation and paging
mechanisms.

Memory management is done in hardware by the memory management unit
(MMU)\index{MMU}.  An operating system must set up certain data structures
(descriptor tables and page tables) to instruct the MMU how logic and linear
memory addresses map to physical memory.

Logical addresses are transformed to linear addresses by adding an offset to
the given virtual address. This mechanism is called segmentation. In IA-32e
mode, segmentation has effectively been dropped in favor of paging, creating a
flat 64-bit linear address space.

Paging is the address translation process of mapping linear to physical
addresses. Memory is organized in so called pages. A memory page is the unit
used by the MMU to map addresses. IA-32e mode provides different page sizes
such as 4 KB, 2 MB and 1 GB.

\begin{figure}[h]
	\centering
	\input{graph_address_translation}
	\caption{One-level address translation}
	\label{fig:address-translation}
\end{figure}

An exemplary one-level address translation process is illustrated in figure
\ref{fig:address-translation}. The MMU splits the linear address in distinct
parts which are used as indexes into page tables. A page table entry specifies
the address of a physical memory page called \emph{page frame}.  Addition of the
offset part of the input address to the page frame yields the effective physical
address.

Paging structures can be arranged hierarchically by letting page table entries
reference other paging structures instead of physical memory pages. In fact,
Intel's IA-32e mode uses four levels of address mapping, which are presented in
the following list:

\begin{itemize}
	\item Page map level 4 (PML4)\index{PML4} references a page directory
		pointer table. The address of the currently active PML4 is stored in
		the CPU's CR3 control register.
	\item Page directory pointer table (PDPT\index{PDPT}) references a page
		directory or a 1 GB page frame.
	\item Page directory (PD\index{PD}) references a page table or a 2 MB page
		frame.
	\item Page table (PT\index{PT}) references a 4 KB page frame.
\end{itemize}

A linear memory address is split into five parts, when all four levels of paging
structures are used. The first element identifies the PML4 entry, the second
references the PDPT entry and so on. The fifth part constitutes the offset added
to the page frame address. Larger page sizes are supported by letting higher
level paging structures reference physical memory pages. In such a case, all
remaining parts of the linear memory address are used as offset into the larger
page frame.

Additionally, paging structure entries allow to specify properties and
permissions such as write access or caching behavior for the referenced
physical memory page. The permissions are checked and enforced by the MMU.

The paging mechanism enables fine-grained memory management on a per-process
basis. The use of different paging structures depending on the currently
executing process allows for partitioning and separation of system memory using
the MMU. This can be achieved by changing the value of the processor's CR3
register whenever a process switch occurs.

If a process tries to access a memory location that has no valid address
translation, the processor raises a page fault exception. The operating system's
page fault handler is then in charge to correct the failure condition. It can
seamlessly resume the execution of the faulting process after handling the
exception. This technique is used by many modern operating systems such as Linux
to implement dynamic memory management.

For a more in-depth look at memory management and address translation the reader
is referred to \cite{IntelSDM}, volume 3A sections 3 \& 4 and
\cite{Drepper07whatevery}.

\subsection{Exceptions and Interrupts}\label{subsec:exceptions-interrupts}
Exceptions\index{exceptions} and interrupts\index{interrupts} signal an event
which occurred in the system or a processor that needs attention. They can
occur at any time and are normally handled by preempting the currently running
process and transferring execution to a special \emph{interrupt service routine}
(ISR\index{ISR}).

Each exception or interrupt has an associated number in the range of 0 to 255,
which is called the interrupt vector\index{interrupt vector}. Numbers in the
range of 0 to 31 are reserved for the Intel architecture. They are used to
uniquely identify exceptions, see Intel SDM \cite{IntelSDM}, volume 3A, section
6.15.  The remaining vectors can be freely used by hardware devices and the
operating system. To avoid vector number clashes hardware interrupts are offset
by 32 to move them out of the reserved range. As an example, the hardware
interrupt 0 (timer) would be mapped to 32, 1 (keyboard) to 33 and so on. As a
result the range of hardware interrupt numbers is restricted to 0 .. 223.

Interrupts are caused by hardware devices\index{device} to notify the processor
that a device needs servicing. An example is a network card that generates an
interrupt whenever a data packet is received from the network. The interrupt
handler responsible for servicing the network card is invoked upon recognition
of the event which then acknowledges further processing of the received data to
the device.

Exceptions are generated by the processor itself when it detects an error
condition during instruction execution. Causes for exceptions are for example
division by zero or page faults.

Interrupts generated by external devices can be blocked by disabling the
processor's IF\index{IF} flag in the FLAGS\index{FLAGS} register. When the flag
is not set, such interrupts are not recognized by the processor until the flag
is enabled again.  Exceptions however are not affected by the IF bit and are
processed as usual.

The main data structure which facilitates interrupt handling is the
\emph{interrupt descriptor table} (IDT)\index{IDT}. It is a list of entries
which point to interrupt handler procedures. The IDT can be located anywhere in
system memory and its physical address must be stored in the IDT register
(IDTR\index{IDTR}). When the processor receives an interrupt, it uses the
interrupt vector as an index into the IDT to determine the associated handler.
Execution control is then transferred by invoking the handler procedure.

\subsection{Programmable Interrupt Controller}\label{subsec:apic}
A Programmable Interrupt Controller (PIC\index{PIC}) is used to connect several
interrupt sources to a CPU. Hardware devices raise interrupts to inform a CPU
that some event occurred which must be handled. One of the best known PICs is
the Intel 8259A which was part of the original PC\footnote{Personal
computer}\index{PC} introduced in 1981.

Early PC/XT ISA\index{ISA} systems used one 8259 controller, allowing only
eight interrupt input lines. By cascading multiple controllers, more lines can
be made available, but a more flexible approach was needed with the advent of
multi-processor (MP\index{MP}) systems containing multiple cores. Implementing
efficient interrupt routing on an MP system using PICs was problematic.

Intel presented the Advanced Programmable Interrupt Controller
(APIC\index{APIC}) concept with the introduction of the Pentium processor. It
was one of several attempts to solve the interrupt routing efficiency problem.
The Intel APIC system is composed of two components: a local APIC
(LAPIC\index{LAPIC}) in each CPU of the system and the I/O APIC\index{I/O APIC},
see figure \ref{fig:apic} for a schematic view.

\begin{figure}[h]
	\centering
	\input{graph_apic}
	\caption{Local APICs and I/O APIC}
	\label{fig:apic}
\end{figure}

A LAPIC receives interrupts from internal sources (such as the timer) and from
the I/O APIC or other external interrupt controllers. It sends these interrupts
to the processor core for handling. In MP systems, the LAPIC also sends and
receives \emph{inter-processor interrupt} messages (IPI\index{IPI}) from and to
other logical processors on the system bus. Each local APIC has a unique ID
called APIC ID\index{APIC ID}. This ID is assigned by system hardware at power
up.

The I/O APIC is used to receive external interrupts from the system and its
associated devices and relays them to the local APICs. The LAPICs are connected
to the I/O APIC via the system bus. The routing of interrupts to the appropriate
LAPICs is configured using a redirection table which must be set up by the
operating system. For more information about APIC and I/O APIC see Intel SDM
\cite{IntelSDM}, volume 3A, chapter 10.

\subsection{Chipset connectivity}
The Intel hardware architecture is constantly evolving. More and more
functionality is moved into the CPU to increase performance. With the advent of
the Intel 5 Series computing architecture \cite{wiki:intel:5:series}, the
CPU\index{CPU} is directly connected to RAM and some PCI Express
(PCIe\index{PCIe}) devices. The memory controller is integrated straight into
the processor, providing fast access to system memory.  The memory controller
arbitrates RAM access and forwards requests to other resources, e.g.
memory-mapped devices to the platform controller hub (PCH).

Similarly, the display controller is either fully integrated into the processor
or connected via PCIe lanes. The CPU and the PCH are linked via the Direct
Media Interface (DMI\index{DMI}).

The platform controller hub\index{PCH} provides connections to most devices and
platform peripherals such as keyboards, network adapters, USB connections and
other buses (PCIe etc). It also features the system clock.
